4 research outputs found
Task-Adaptive Tokenization: Enhancing Long-Form Text Generation Efficacy in Mental Health and Beyond
We propose task-adaptive tokenization as a way to adapt the generation
pipeline to the specifics of a downstream task and enhance long-form generation
in mental health. Inspired by insights from cognitive science, our
task-adaptive tokenizer samples variable segmentations from multiple outcomes,
with sampling probabilities optimized based on task-specific data. We introduce
a strategy for building a specialized vocabulary and introduce a vocabulary
merging protocol that allows for the integration of task-specific tokens into
the pre-trained model's tokenization step. Through extensive experiments on
psychological question-answering tasks in both Chinese and English, we find
that our task-adaptive tokenization approach brings a significant improvement
in generation performance while using up to 60% fewer tokens. Preliminary
experiments point to promising results when using our tokenization approach
with very large language models.Comment: Accepted at the main conference of The 2023 Conference on Empirical
Methods in Natural Language Processing; 8 page
HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models
Theory of Mind (ToM) is the ability to reason about one's own and others'
mental states. ToM plays a critical role in the development of intelligence,
language understanding, and cognitive processes. While previous work has
primarily focused on first and second-order ToM, we explore higher-order ToM,
which involves recursive reasoning on others' beliefs. We introduce HI-TOM, a
Higher Order Theory of Mind benchmark. Our experimental evaluation using
various Large Language Models (LLMs) indicates a decline in performance on
higher-order ToM tasks, demonstrating the limitations of current LLMs. We
conduct a thorough analysis of different failure cases of LLMs, and share our
thoughts on the implications of our findings on the future of NLP.Comment: Accepted at Findings of EMNLP 202
You Are What You Annotate: Towards Better Models through Annotator Representations
Annotator disagreement is ubiquitous in natural language processing (NLP)
tasks. There are multiple reasons for such disagreements, including the
subjectivity of the task, difficult cases, unclear guidelines, and so on.
Rather than simply aggregating labels to obtain data annotations, we instead
try to directly model the diverse perspectives of the annotators, and
explicitly account for annotators' idiosyncrasies in the modeling process by
creating representations for each annotator (annotator embeddings) and also
their annotations (annotation embeddings). In addition, we propose TID-8, The
Inherent Disagreement - 8 dataset, a benchmark that consists of eight existing
language understanding datasets that have inherent annotator disagreement. We
test our approach on TID-8 and show that our approach helps models learn
significantly better from disagreements on six different datasets in TID-8
while increasing model size by fewer than 1% parameters. By capturing the
unique tendencies and subjectivity of individual annotators through embeddings,
our representations prime AI models to be inclusive of diverse viewpoints.Comment: Accepted to Findings of EMNLP 202
The Cross-lingual Conversation Summarization Challenge
We propose the shared task of cross-lingual conversation summarization,
\emph{ConvSumX Challenge}, opening new avenues for researchers to investigate
solutions that integrate conversation summarization and machine translation.
This task can be particularly useful due to the emergence of online meetings
and conferences. We construct a new benchmark, covering 2 real-world scenarios
and 3 language directions, including a low-resource language. We hope that
\emph{ConvSumX} can motivate researches to go beyond English and break the
barrier for non-English speakers to benefit from recent advances of
conversation summarization